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Practical  Aspects  for  Designing  Statistically  Optimal 

Experiments 

Mark  J.  Anderson,  Patrick  J.  Whitcomb 
Stat-Ease,  Inc.,  Minneapolis,  MN  USA 

Due  to  operational  or  physical  considerations,  standard  factorial  and  response  surface  method  (RSM)  design  of 
experiments  (DOE)  often  prove  to  be  unsuitable.  In  such  cases  a  computer-generated  statistically-optimal  design 
fills  the  breech.  This  article  explores  vital  mathematical  properties  for  evaluating  alternative  designs  with  a  focus  on 
what  is  really  important  for  industrial  experimenters.  To  assess  "goodness  of  design"  such  evaluations  must 
consider  the  model  choice,  specific  optimality  criteria  (in  particular  D  and  I),  precision  of  estimation  based  on  the 
fraction  of  design  space  (FDS),  the  number  of  runs  to  achieve  required  precision,  lack-of-fit  testing,  and  so  forth. 
With  a  focus  on  RSM,  all  these  issues  are  considered  at  a  practical  level,  keeping  engineers  and  scientists  in  mind. 
This  brings  to  the  forefront  such  considerations  as  subject-matter  knowledge  from  first  principles  and  experience, 
factor  choice  and  the  feasibility  of  the  experiment  design. 

Key  words:  design  of  experiments,  optimal  design,  response  surface  methods,  fraction  of  design  space. 


Introduction 

Statistically  optimal  designs  emerged  over  a  half  century  ago  (Kiefer,  1959)  to  provide  these  advantages 
over  classical  templates  for  factorials  and  RSM: 

•  Efficiently  filling  out  an  irregularly  shaped  experimental  region  such  as  that  shown  in  Figure  1  (Anderson 
and  Whitcomb,  2005), 


450  460 
A  -  Temperature 

Figure  1.  Example  of  irregularly-shaped  experimental  region  (a  molding  process). 

•  Minimizing  the  runs  to  just  what  is  needed  to  fit  the  assumed  polynomial  model, 

•  Accommodating  unusual  requirements  concerning  either  the  number  of  blocks  or  the  number  of  runs  per 
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block, 

•  Handling  a  combination  of  factor  types,  such  as  continuous,  discrete,  categorical  and  mixture. 

Over  time  a  number  of  criteria  labeled  alphabetically  became  favored  by  industrial  experiments  for 
optimal  designs  (Box  and  Draper,  2007),  of  which  two  will  be  covered  in  this  article:  I-optimal  and  D-optimal. 
Our  focus  will  be  kept  to  RSM. 

What  is  a  "Good"  Experiment  Design? 

To  answer  this  important  question,  let's  start  with  a  wish-list  for  choosing  a  "suitable"  experiment  design 
derived  from  one  provided  by  George  Box  (Box,  1982) — the  co-inventor  of  RSM  (Box  and  Wilson,  1951): 

(1)  Allow  the  chosen  polynomial  to  be  estimated  well. 

(2)  Give  sufficient  information  to  allow  a  test  for  lack  of  fit  by 

a.  Having  more  unique  design  points  than  model  coefficients,  and 

b.  Providing  an  estimate  of  "pure"  error,  i.e.,  replicates. 

(3)  Remain  insensitive  to  outliers,  influential  values  and  bias  from  model  misspecification. 

(4)  Be  robust  to  errors  in  control  of  the  factor  levels. 

(5)  Permit  blocking  and  sequential  experimentation. 

(6)  Provide  a  check  on  variance  assumptions,  e.g.,  studentized  residuals  are  normal  with  a  mean  of  zero 
and  constant  variance. 

(7)  Generate  useful  information  throughout  the  region  of  interest,  i.e.,  provide  a  good  distribution  of 
standard  error  of  predictions. 

(8)  Do  not  contain  an  excessively  large  number  of  trials. 

When  applying  RSM,  industrial  experimenters  generally  choose  as  step  one  a  quadratic  polynomial,  which 
are  remarkably  versatile  for  empirical  modeling.  For  this  purpose,  the  central  composite  design  (CCD),  also 
known  as  the  Box-Wilson  in  honor  of  the  developers,  scores  well  on  all  the  desired  attributes.  However, 
standard  layouts  like  the  CCD  are  not  a  good  fit  for  non-cuboidal  regions  such  as  that  illustrated  in  Figure  1 . 
For  situations  like  this  or  others  spelled  out  in  the  Introduction,  optimal  designs  are  seemingly  the  panacea. 
However,  as  we  will  discuss  further,  you  had  best  keep  in  mind  that  "designing  an  experiment  should  involve 
balancing  multiple  objectives,  not  just  focusing  on  a  single  characteristic"  (Myers,  Montgomery  and 
Anderson-Cook,  2009). 

Purely  Optimal  Designs — Comparing  I  Versus  D  as  the  Criterion 

Although  there  are  many  variations  on  the  theme  of  optimal  design,  the  following  two  criteria  are  the  ones 
primarily  used  in  industrial  experimentation: 

•  optimal  (also  known  as  'TV")  to  minimize  the  integral  of  the  prediction  variance 

•  D-optimal  to  minimize  the  volume  of  the  confidence  ellipsoid  for  the  coefficients  and  thus  maximize 
information  on  the  polynomial  coefficients. 

Rather  than  getting  mired  down  in  the  mathematical  details  for  these  two  criteria  and  the  multitude  of 
algorithms  for  applying  them  (readily  available  from  a  vast  array  of  references — some  already  cited  here),  we  will 
focus  on  how  they  differ  in  actual  application  to  RSM  experiments  designed  to  fit  quadratic  polynomials  such  as 
the  equation  show  above.  For  this  purpose  a  good  tool  for  comparison  is  the  standard  error  (SE)  plot,  such  as  those 
shown  in  Figure  2  for  a  12-run  RSM  design  on  one  factor  where  the  points  are  picked  I-  versus  D-optimally. 
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Figure  2.  Standard  error  plots  for  I  (left)  versus  D  optimal  (right)  for  a  12-run,  one-factor  RSM  design. 

Comparing  these  plots  side-by-side  provides  a  picture  of  the  relative  quality  of  predicted  response  at 
various  locations  spanning  the  experimental  region  (shown  here  in  terms  of  coded  units  from  the  center).  The 
greater  replication  of  points  by  the  I-criterion  is  desirable  for  RSM  because  it  lowers  the  standard  error  of 
prediction  at  the  center — the  point  of  greatest  interest — and  provides  a  fairly  flat  profile  for  a  broader  (relative 
to  the  D-criterion)  range  in  the  middle  of  the  experimental  region. 

Another  way  to  compare  designs  is  via  the  fraction  of  design  space  (FDS)  plot  (Anderson-Cook,  Borror 
and  Montgomery,  2009),  which  consists  of  a  single  line  for  a  given  design,  thus  allowing  display  of  the 
prediction  variance  for  several  designs  at  once.  Figure  3  lays  out  the  FDS  curves  for  the  aforementioned  12-run 
RSM  design  done  by  the  competing  criterion — I  versus  D. 
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Figure  3.  FDS  plots  for  I  versus  D  optimal  for  a  12-run,  one-factor  RSM  design. 
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The  legend  in  Figure  3  provides  benchmarks  on  the  standard  error  minimums,  averages  and  maximums. 
For  RSM  purposes,  the  I-optimal  provides  a  desirable  tradeoff  of  being  higher  at  its  maximum  (not  good)  but 
lower  on  average  (good)  than  the  D-optimal  design. 

At  this  stage  we  put  I-optimal  at  the  forefront.  However,  neither  of  the  two  optimal  designs  under 
consideration  provide  for  a  lack-of-fit  (LOF)  test.  Therefore  an  experimenter  cannot  assess  whether  the  model 
they  chose  provides  an  adequate  approximation  of  the  true  response.  It's  vital  to  keep  in  mind  that  "no 
postulated  model  can  ever  be  assumed  to  be  fully  correct  [therefore]  the  basic  assumptions  underlying  the 
alphabetic-optimality  approach  are  often  unrealistic  from  the  practical  viewpoint  of  actually  designing  real 
experiments"  (Draper  &  Guttman,  1988). 

Modifying  Optimal  Designs  to  Make  Them  More  Robust  to  Model  Misspecification 

We  will  now  do  another  comparison  of  point  selection  using  I-  vs  D-optimal  on  a  one-factor  design  for  a 
quadratic  model,  but  this  time  the  designs  will  be  modified  with  LOF  points  to  check  for  model 
misspecification.  We  recommend  4  runs  for  these  'check-points' — chosen  to  maximize  the  minimum  distance 
from  existing  design  points;  thus  filling  'holes'  in  the  experimental  space.  This  is  known  as  the  "distance" 
criterion.  Figure  4  provides  a  side-by-side  view,  by  way  of  standard  error  plots,  of  how  this  modification  affects 
the  spread  of  points  compared  to  the  purely  optimal  selection  shown  in  Figure  2. 
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Figure  4.  SE  plots  for  LOF-modified  I-optimal  (left)  versus  LOF-modified  D-optimal  (right)  for  a  12-run,  one- factor  RSM  design. 

Observe  how  gaps  in  the  experimental  range  have  been  plugged. 
Next  let's  recalculate  the  FDS  for  the  two  alternative  optimal  criterion. 

Compare  and  contrast  the  curves  and  data  on  Figures  3  and  5 — the  first  being  purely  optimal  and  the 
second  modified  with  lack-of-fit  points.  Notice  that  by  all  statistical  measures  and  the  curves  themselves  that 
not  much  differs.  In  this  case  the  advantage  of  having  a  check  for  model  misspecification  outweighs  the  minor 
loss  in  optimality  and  slight  degradation  in  FDS  quality. 
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Figure  5.  FDS  plots  for  LOF-modified  I  versus  D  optimal  for  a  12-run,  one- factor  RSM  design 

Extending  These  Findings  to  Two  Factors 

A  similar  case  can  be  made  for  two  factors  and,  by  extension,  beyond — once  the  minimum  points  needed 
to  fit  the  chosen  polynomial  model  are  selected  via  an  optimal  criterion,  adding  LOF  points  causes  little  harm 
and  they  create  a  lot  of  good.  Figure  6  lays  out  the  points  on  SE  plot  for  a  14-run,  I-optimal,  RSM-quadratic 
design  with  zero  versus  four  LOF  points. 
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Figure  6.  SE  plots  for  purely  I-optimal  (left)  versus  LOF-modified  (right)  for  a  14-run,  two-factor  RSM  design. 

The  space-filling  effect  of  shifting  some  points  from  optimal  to  distance-based  criterion  (for  LOF)  is  very 
apparent  when  comparing  these  plots  from  left  to  right.  This  comes  at  little  cost  to  the  I-optimal  design  quality 
as  evidenced  by  the  FDS  curve  and  properties  laid  out  in  Figure  7. 
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However,  when  we  make  the  same  comparison  for  D-optimal  designs,  the  trade-off  of  4  purely  optimal 
points  for  ones  chosen  by  distance  (layouts  shown  in  Figure  8)  to  provide  LOF  does  not  go  quite  as  well — see 
how  the  FDS  curve  shifts  in  Figure  9  and  note  the  degradation  in  D-optimality  as  evidenced  by  the  determinant 
results  being  increased. 
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Figure  7.  FDS  plots  for  purely  I-optimal  versus  LOF-modified  I-optimal  for  a  14-run,  two-factor  RSM  design. 
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Figure  8.  SE  plots  for  purely  D-optimal  (left)  versus  LOF-modified  D-optimal  (right)  for  a  14-run,  two-factor  RSM  design. 
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Figure  9.  FDS  plots  for  purely  D-optimal  versus  LOF-modified  D-optimal  for  a  14-run,  two-factor  RSM  design. 
This  lends  further  support  for  the  use  of  I  over  D-optimal  designs  for  response  surface  method  optimization. 

Bolstering  the  Design  with  Replicates 

Replicates  are  needed  to  provide  the  pure  error  needed  for  a  LOF  test.  In  the  cases  presented  above  they 
came  with  the  optimal  portion  of  the  designs.  A  better  way  to  build  a  practical  experiment  is  to  detail  the 
number  of  replicates  and  choose  them  by  the  optimal  criterion — we  recommend  a  minimum  of  4  points  being 
selected  in  this  manner.  This  provides  enough  degrees  of  freedom  for  a  reasonably-powerful  LOF  test. 

Conclusion 

In  consideration  of  the  practical  aspects  of  algorithmic  design  for  RSM  we  recommend  adding  by 
distance-base  at  least  4  points  for  testing  lack  of  fit,  even  though  this  makes  the  experiment  less  alphabetically 
optimal.  This  a  good  trade  off.  Furthermore,  in  physical  experiments  it  is  desirable  build  in  an  estimate  of 
experimental  error  based  on  at  least  4  degrees  of  freedom  via  replicated  design  point(s).  For  choice  of  optimal 
criterion  we  advise  using  I-  for  empirical  modeling  (RSM) — reserving  D-optimal  (because  of  their 
more-precise  estimation  of  model  coefficients)  for  factorial  screening. 

However,  even  with  these  practical  aspects  incorporated,  optimal  design  cannot  make  up  for: 

•  Choosing  the  factors  that  are  not  among  the  vital  few. 

•  Designing  for  an  inadequate  empirical  model. 

•  Going  out  of  the  region  of  operability. 

•  Measuring  the  wrong  responses. 

"The  exact  functional  relationship  is  usually  unknown  and  possibly  unknowable.  We  have  only  to  think  of  the  flight 
of  a  bird,  the  fall  of  a  leaf,  or  the  flow  of  water  through  a  valve  to  realize  that  we  are  likely  to  be  able  to  approximate  only 
the  main  features  of  such  a  relationship."  -  Box  &  Draper 
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Application  of  Statistical  Methods  and  GIS  for  Downscaling  and 
Mapping  Crop  Statistics  Using  Hypertemporal  Remote  Sensing 
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Research  Unit  on  Environment  and  Conservation  of  Natural  Resources, 
National  Institute  of  Agricultural  Research  (INRA),  Rabat,  Morocco.  Email:  ahmed_douaik@yahoo.com 

To  sustain  the  management  of  natural  resources,  land  use  and  land  cover  (LULC)  should  be  spatially  mapped  and 
temporally  monitored  using  GIS.  For  large  areas,  conventional  methods  are  laborious.  Alternatively,  remote  sensing 
can  be  used  for  LULC  mapping  and  monitoring.  Normalized  differential  vegetation  index  (NDVI)  is  the  most  used 
vegetation  index  for  crop  identification  and  phenology.  For  agricultural  areas,  crop  statistics  are  estimated  yearly  at 
regional  level  following  administrative  units.  However,  these  statistics  are  not  informing  about  spatial  extent  of 
these  crops  within  administrative  units;  such  information  is  crucial  for  crop  monitoring.  The  main  objective  of  this 
research  was  to  fill  the  gap,  based  on  statistical  methods  and  GIS,  by  adding  spatial  information  to  crop  statistics  by 
analyzing  temporal  NDVI  profiles.  The  study  area  covers  1300  km2.  Data  consist  of  147  decadal  Spot  Vegetation 
NDVI  images.  Crop  statistics  were  compiled  on  seasonal  basis  and  aggregated  to  different  administrative  levels. 
Images  were  processed  using  an  unsupervised  classification  method.  A  series  of  classification  runs  corresponding 
to  different  numbers  of  clusters  were  used.  Using  stepwise  multiple  linear  regression,  cropped  areas  from 
agricultural  statistics  were  related  to  areas  of  each  NDVI  profile  cluster.  Estimated  regression  coefficients  were 
used  to  generate  maps  showing  cropped  fractions  by  map  units.  The  optimal  number  of  clusters  was  18.  Similar 
profiles  were  merged  leading  to  eight  clusters.  The  results  show  that,  for  example,  rice  was  grown,  in  autumn,  on 
50%  of  the  area  of  map-units  represented  by  NDVI-profile  group  4  and  75%  of  the  area  of  group  7  while  it  was 
grown,  in  spring,  on  2,  69  and  25%  of  areas  of  NDVI-profile  groups  2,  6,  and  7,  respectively.  Regression 
coefficients  were  used  to  generate  map  of  crops.  This  research  illustrates  the  benefit  of  integrating  statistical 
methods,  GIS,  remote  sensing  and  crop  statistics  to  delineate  NDVI  profile  clusters  with  their  corresponding 
agricultural  land  cover  map  units  and  to  link  these  statistics  to  geographical  locations.  These  map  units  can  be  used 
as  a  reference  for  future  monitoring  of  natural  resources,  in  particular  crop  growth  and  development  and  for 
forecasting  crop  production  and/or  yield  and  stresses  like  drought. 

Keywords:  Crop  Statistics;  GIS;  Multiple  Regression;  NDVI;  Unsupervised  Classification. 

Introduction 

The  food  needs  of  the  ever  increasing  world  population  should  be  satisfied  quantitatively  and  qualitatively. 
Since  the  spatial  extent  of  arable  lands  is  limited,  the  focus  is  currently  on  a  better  and  sustainable  use  and 
management  of  natural  resources,  including  soil  and  land  resources.  In  order  to  attain  this  sustainability,  land 
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use  and  land  cover  should  be  spatially  mapped  and  temporally  monitored.  As  the  areas  to  be  mapped  are  very 
large,  conventional  methods,  through  aerial  photo-interpretation  are  laborious  and  expensive  (Tucker,  1979; 
Philipson,  1997;  Taylor  et  al,  2000;  Falkner  and  Dennis,  2002).  Alternatively,  satellite  remote  sensing  tools  can 
be  beneficially  used  for  land  use  and  land  cover  mapping  and  monitoring  (Cihlar,  2000;  Lillesand  et  al,  2007; 
Giri,  2012).  In  this  way,  the  normalized  differential  vegetation  index  (NDVI),  initially  proposed  by  Rouse  et  al 
(1973)  and  which  measures  the  vigor  and  greenness  of  vegetation  (Tarpley  et  al,  1984),  is  the  most  used  among 
the  vegetation  indices  for  studying  vegetation,  and  specifically  crop,  phenologies  (Sarkar  and  Kafatos,  2004; 
Sakamoto  et  al,  2005;  White  et  al,  2009;  Atkinson  et  al,  2012;  You  et  al,  2013)  and  yield  forecasting  (Das  et  al, 
1993;  Ferencz  et  al,  2004;  Mkhabela  et  al,  2005).  Remote  sensing  was  also  used  for  estimating  cropped  area 
(Campbell  et  al,  1987;  Sheub  and  Atkins,  1991;  Labus  et  al,  2002;  Howard  et  al,  2012).  Time  series  of  NDVI 
were  used  to  discriminate  between  vegetation  and  other  land  uses  (Nordberg  and  Evertson,  2003;  Knight  et  al, 
2006)  and,  for  vegetation,  between  different  green  areas  specific  to  crops,  forests,  etc  (Murakami  and  al,  200 1 ; 
Balaghi  et  al,  2008;  Xie  et  al,  2008).  For  agricultural  areas,  crop  statistics  (mainly  cropped  areas  and  production) 
are  estimated  yearly  at  regional  level  following  given  administrative  units  (USDA,  2014).  However,  these 
statistics  are  not  informing  about  the  spatial  extent  of  these  crops  within  administrative  units,  such  information 
is  crucial  for  crop  monitoring  in  future  time.  Since  more  than  two  decades  ago,  remote  sensing  was  helpful  in 
determining  crop  acreage  and  productivity  (Allen,  1990;  Gonzalez-Alonso  et  al,  1997;  Carfagna  and  Gallego, 
2005).  However,  it  is  just  very  recently  that  hypertemporal  remote  sensing  was  used  to  describe  and  map 
variability  of  cropping  patterns  of  different  crops  in  Spain  (Khan  et  al,  2010),  rice  in  Vietnam  (Nguyen  et  al, 
2012),  winter  crops  in  Australia  (Potgieter  et  al,  2013),  relationship  between  the  fraction  of  evergreen  forests 
and  the  presence  of  epiphyllous  liverworts  in  China  (Jiang  et  al,  2013),  gradient  in  land  cover  vegetation 
growth  in  Greece  (Ali  et  al,  2013),  natural  landscape  heterogeneity  in  Greece  (Ali  et  al,  2014).  The  main 
objective  of  this  research  work  was  to  fill  the  gap  in  the  official  crop  statistics  by  adding  them  the  spatial 
information  through  the  analysis  of  hypertemporal  remote  sensing,  i.e.,  temporal  NDVI  profiles  using  different 
statistical  methods. 

Material  and  Methods 

Study  area 

The  study  area  is  situated  in  the  western  part  of  Nizamabad  district,  Hyderabad  province,  Andra  Pradesh 
state,  in  central  India.  The  district  has  an  irrigation  system  used  for  rice  cultivation,  cotton  cultivation  on 
vertisols,  dryland  cropping  on  poor  sandy  soils,  forests  on  hilly  land,  and  degraded  areas.  The  study  area  is 
spatially  very  heterogeneous.  The  soils  are  classified  into  four  main  orders:  Inceptisols,  Alfisols,  Vertisols,  and 
Entisols.  The  climate  is  tropical  with  hot  summers  (maximum  mean  monthly  temperature  of  about  40  °C)  and 
cool  and  dry  winters  (maximum  mean  monthly  temperature  of  about  13  °C).  Regarding  rainfall,  it  is  about  900 
mm  which  occur  during  2  months  during  the  southwest  monsoon.  Six  Mandals  or  sub-districts  are  concerned 
by  this  study  covering  1300  km2  from  which  90000  Ha  are  agricultural  lands  and  18000  Ha  are  shrub  and  non 
cultivated  area. 

Data 

Data  consist  of  147  geo-referenced  and  stacked  Spot  Vegetation  composite  NDVI  images  provided  by 
VITO  (http://www.VGT.vito.be).  They  have  a  spatial  resolution  of  1  km2  and  available  on  a  decadal  basis  for  a 
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period  ranging  from  April  1998  to  April  2002.  Two  other  types  of  data  were  used.  Land  cover  map  at  1/50000 
scale  was  established  from  images  acquired  in  1994/1995  by  the  Indian  remote  sensing  satellite  IRS-C  using 
the  Liss-III  sensor  (spatial  resolution  of  23  m).  For  this  work  the  original  1 8  legend  entries  were  simplified  and 
reduced  to  seven.  Regarding  crop  statistics,  they  were  compiled  on  seasonal  basis  and  aggregated  to  different 
administrative  levels  (CPO,  2001). 

Methods 

The  normalized  differential  vegetation  index  is  defined  by: 

NDVI =  (IR  -  R)  I  (IR  +  R)  Eq(l) 

IR  and  R  are  the  infra  red  (0.78  -  0.89  \aa)  and  red  (0.61  -  0.68  |am)  bands,  respectively,  which  are  bands 
3  and  2  for  Spot  Vegetation. 

The  NDVI  values  were  reported  as  digital  number  (DN)  values,  ranging  between  0  and  255,  using  the 
following  equation: 

DN  =  (NDVI  +  0.1)  /  0.004  Eq  (2) 

The  stacked  147  NDVI  images  were  processed  using  ISODATA  clustering  algorithm  (Mather  and  Koch, 
2011),  an  unsupervised  classification  method,  available  in  the  Erdas  Imagine  software  (Erdas,  2003).  A  series 
of  classification  runs  corresponding  to  different  number  of  clusters  (2  to  30)  were  used.  The  ISODATA 
algorithm  tries  to  minimize  the  Euclidian  distance  to  form  clusters.  The  results  of  the  different  runs  are 
compared  using  the  divergence  separability  which  is  a  statistical  measure  of  distance  (Landgrebe,  2003);  the 
'best'  number  of  clusters  is  the  one  corresponding  to  the  run  having  the  highest  minimum  and/or  average 
divergence  (Swain  and  Davis,  1978).  The  maximum  number  of  iterations  was  50  and  the  divergence  threshold 
was  1 .  The  spectral  signatures  of  the  clusters  are  represented  graphically  and  similar  NDVI  profiles  are  merged 
to  reduce  the  unnecessary  large  number  of  clusters. 

Once  the  number  of  clusters  is  known,  the  NDVI  profile  clusters  map  is  established  and  compared  to  the 
land  cover  map  to  match  the  preliminary  legend  of  the  former  with  the  legend  of  the  latter  and  also  to  get  an 
idea  about  the  land  cover  classes  that  are  present  in  each  of  the  NDVI  profile  clusters. 

The  NDVI  profile  clusters  map,  a  raster,  is  converted  to  polygons  and  cropland  areas  are  masked  by  using 
the  land  cover  map  to  keep  only  NDVI  profile  clusters  corresponding  to  agricultural  land  (Maselli  and  Rembold, 
2001;Kastens  et  al,  2005). 

Using  GIS  spatial  analysis  functions  from  ArcGIS  (ESRI,  2009),  the  Mandals  and  the  agricultural  masked 
NDVI  profile  clusters  map  are  overlaid  to  determine  the  respective  areas  (Ha)  of  each  NDVI  profile  cluster  by 
Mandal.  These  areas  are  further  used  as  explanatory  or  independent  variables,  in  the  stepwise  multiple  linear 
regression  (Neter  and  al,  1996),  with  the  cropped  areas  (Ha)  from  agricultural  statistics  by  season,  crop,  and 
Mandal  as  dependent  variable: 

n 

CA  =  Y^  d  *  ND  Vlclusten  Eq  (3) 

1=1 

with  CA  representing  cropped  area  (Ha)  by  Mandal  and  NDVIclusten  representing  the  area  (Ha)  of  the  i*  NDVI 
profile  cluster. 

No  constant  was  considered  in  the  regression  and  the  coefficients  c,  were  constrained  to  the  0  -  1  range  in 
order  to  determine  the  estimated  fraction  or  percentage  of  total  area  of  a  given  NDVI  profile  cluster  where  a 
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given  crop  was  grown  at  a  given  Mandal  and  a  given  season.  Once  the  regression  coefficients  were  estimated, 
the  above  equation  was  used  to  generate  maps  showing  cropped  fractions  by  map  units.  Statistical  computations 
were  done  using  the  SPSS  software  (SPSS,  2008). 

Results  and  Discussion 

Average  and  minimum  divergence  values  between  clusters  corresponding  to  the  different  runs  (2  to  30 
clusters)  are  reported  in  Figure  1 . 
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Figure  1.  Average  (left)  and  minimum  (right)  separability  divergence  values. 

The  highest  value  for  the  average  divergence  corresponds  to  18  clusters  whereas  the  highest  one  for 
minimum  divergence  corresponds  to  only  4  clusters  while  1 8  clusters  resulted  in  a  reasonable  high  divergence 
value.  So,  based  on  these  results,  the  optimal  number  of  clusters  given  the  best  separability  between  them  was 
taken  to  be  18.  The  corresponding  average  spectral  signatures  are  displayed  on  Figure  2. 
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Figure  2.  Average  spectral  signatures  of  the  18  NDVI  profile  clusters. 
From  this  figure,  some  profiles  (mainly  1,  2,  and  15)  have  a  distinct  pattern  while  most  of  the  others  have 
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a  more  or  less  similar  pattern.  Most  similar  profiles  were  merged:  profiles  3  to  7  and  10;  profiles  8,  9,  11  and  14; 
profiles  12  and  13;  and  profiles  17  and  18.  This  combining  of  NDVI  profiles  resulted  finally  in  only  eight 
clusters  (Figure  3)  and  the  corresponding  NDVI-unit  map  is  displayed  in  Figure  4. 
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Figure  3.  Average  spectral  signatures  of  remaining  8  NDVI  profile  clusters  after  merging  similar  ones. 

NDVI  profile  clusters  for  Agricultural  Land  Cover 
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Figure  4.  NDVI  profile  cluster  map.  N.B:  cluster  1  is  not  present  in  the  six  Mandals. 


White  zones  correspond  to  non  agricultural  areas.  NDVI  profile  cluster  1  was  present  in  the  whole  image 
of  the  Nizamabad  district  but  not  in  the  six  Mandals  or  sub-districts  corresponding  to  the  study  area.  Also, 
clusters  2  and  8  are  largely  under-represented  whereas  clusters  3,  4,  and  7  are  much  more  present.  These  results 
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are  confirmed  by  statistics  provided  in  Table  1  which  show  the  relation  between  agricultural  land  cover  (either 
in  Kharif,  Rabi  or  both  seasons)  and  the  NDVI  profile  clusters  and  their  corresponding  areas  and  fractions. 
Cluster  3  is  present  in  almost  half  of  the  total  agricultural  area,  cluster  7  in  fifth  of  this  area  and  clusters  4  and  6 
in  15  and  10  %,  respectively.  In  contrast,  clusters  2  and  8  are  present  in  less  than  1%  while  cluster  5  is  present 
in  4%  of  the  total  agricultural  area. 

The  results  of  stepwise  multiple  linear  regression,  for  the  main  crops  by  season,  are  reported  in  Table  2. 

This  table  shows  that  clusters  2,  5,  and  8  are  not  involved  in  the  regression  models,  at  least,  for  the  crops 
used  at  this  step.  This  relates  directly  to  the  very  limited  extent  of  these  clusters  (see  Table  1).  Rice  was  grown, 
in  the  Kharif  season,  on  50%  of  the  area  of  map-units  represented  by  the  NDVI-profile  group  4  and  75%  of  the 
area  of  group  7  while  it  was  grown,  in  Rabi  season,  on  2,  69  and  25%  of  areas  of  NDVI -profile  groups  2,  6,  and 
7,  respectively. 


Table  1 

Areas  (Ha)  and  percentage  of  agricultural  land  cover  by  season  corresponding  to  each  NDVI-unit 


NDVI  unit  /  Season 

Kharif 

Rabi 

Both  seasons 

Area  (Ha) 

Percentage 

2 

1 

8 

0 

9 

0.01 

3 

21602 

19806 

1001 

42409 

48.92 

4 

9875 

2199 

1414 

13488 

15.56 

5 

1289 

1104 

985 

3378 

3.90 

6 

3067 

2426 

3426 

8920 

10.29 

7 

240 

3393 

14583 

18216 

21.01 

8 

164 

93 

7 

264 

0.30 

Total  Area  (Ha) 

36239 

29029 

21417 

86685 

100 

Percentage 

41.81 

33.49 

24.71 

100 

Table  2 

Adjusted  R2  and  coefficients  (%)  j 

ror  stepwise  multiple  linear  regression  with  total  areas  (Ha)  for  main  crops  in 

both  seasons 

NDVI  units 

Kharif 

Adjusted  R2 

3 

4 

6 

7 

Area  (Ha) 

Cotton 

87.5 

15.6 

6860 

Maize 

81.3 

4.1 

482 

Pulses 

96.9 

48.0 

64.1 

29121 

Rice 

95.0 

50.3 

75.3 

22774 

Sugarcane 

89.9 

26.0 

2395 

Rabi 

Groundnut 

80.3 

53.2 

5942 

Pulses 

80.9 

5.5 

2824 

Rice 

99.8 

1.8 

69.1 

25.0 

11481 

Sorghum 

86.1 

32.5 

15454 

Sugarcane 

85.9 

21.6 

1960 

Total  Area  (Ha)  for  both  seasons 

42409 

13488 

8920 

18216 

The  above  regression  coefficients  were  used  to  generate  map  of  crops.  For  illustration,  the  map  of  rice,  for 
both  seasons,  is  displayed  in  Figure  5. 

The  comparison  of  these  two  maps  shows  that  rice  is  cropped  in  both  seasons  mainly  in  3  Mandals:  the 
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southernmost  one  and  the  two  located  in  the  North-East  of  the  study  area.  Regarding  the  southernmost  Mandal, 
rice  is  not  cropped  in  the  same,  but  different,  locations  during  the  two  seasons.  For  the  two  North-East  Mandals, 
the  non  cropped  areas  (0%)  in  Kharif  were  intensively  cropped  (69%)  in  Rabi  whereas  the  areas  intensively 
cropped  (75%)  in  Kharif  were  moderately  cropped  (25%)  in  Rabi. 

Map  of  estimated  rice  in  Rabi  season  (fractions  /  km2) 
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Figure  5.  Estimated  maps  for  rice  grown  in  Kharif  (left)  and  Rabi  (right)  seasons. 

Conclusions 

This  research  work  illustrated  the  benefit  of  integrating  hypertemporal  remote  sensing  data  with  crop 
statistics  to  delineate  NDVI  profile  clusters  with  their  corresponding  agricultural  land  cover  map  units  and  to 
link  these  statistics  to  geographical  locations  (mainly  administrative  units).  These  map  units  can  be  used  as  a 
reference  for  future  monitoring  of  natural  resources,  in  particular  crop  growth  and  development  and 
consequently  for  forecasting  crop  production  and/or  yield  and  stresses  like  drought. 
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Crop  Yield  Estimation  with  Farmers'  Appraisal 
on  Weather  Condition 

Jean  Baptiste  HABYARIMANA 
Agriculture  Statistician 
Ministry  of  Agriculture  and  Animal  Resources 
Kigali,  Rwanda 

Crop  yield  is  mainly  affected  by  weather  condition,  inputs,  and  agriculture  policies.  In  the  crop  yield  estimation, 
farmers'  perception  on  weather  conditions  lead  to  the  assessment  of  how  well  yield  would  be  compared  to  the 
previous  seasons.  This  paper  applies  Bayesian  estimation  method  to  estimate  crop  yield  with  farmers'  appraisal  on 
weather  condition.  The  paper  shows  that  crop  yield  estimation  with  farmers'  appraisal  on  weather  condition  takes 
into  account  risk  proportionally  to  climate  change.  In  light  of  the  United  Nations  efforts  aimed  to  build  a 
consolidated  agriculture  statistical  system  across  countries,  the  statistical  model  developed  here  should  provide  an 
important  tool  both  for  the  crop  yield  estimation  and  food  price  analysis. 

Key  words:  Crop  Yield  Estimation,  Farmers'  Appraisal  on  Weather  Condition,  Crop  growing  condition,  Bayesian 
Estimation  Method 

Introduction 

Land  productivity  is  a  vital  factor  in  feeding  the  population.  It  is  also  a  critical  factor  in  the  struggle  of 
developing  countries  to  improve  the  availability  of  food.  In  line  with  MDGs,  developing  countries  have 
undertaken  different  measures  to  improve  land  productivity.  Following  all  those  policies  the  main  challenge  for 
agriculture  statistician  in  many  developing  countries  remains  to  be  the  development  of  statistical  models  that 
could  provide  reliable  crop  yield  estimates  with  high  exactitude  to  monitor  and  assess  the  progress  of 
agricultural  land  productivity.  This  study  undertakes  this  task  to  offer  a  method  complementary  to  those 
available  in  the  literature  such  as  crop  cutting  and  famers'  estimate. 

Literature  Review 

In  most  of  the  cases,  yield  forecasts  are  based  on  reports  by  crop  correspondents  at  regular  intervals  during 
the  growing  season  using  crop  appearance  as  an  indicator.  In  this  context,  the  most  accurate  method  of 
estimation  consists  of  crop  cutting  method"  (G.R.  Spinks). 

Thus,  crop  condition  data  has  the  potential  to  provide  a  simple,  regular  source  of  information  about  the 
realized  yield.  In  addition,  weather  affects  crops  differently  during  different  stages  of  crop  growth  (Ranjana 
Agrawal,  2012). 


Corresponding  author:  Jean  Baptiste  HABYARIMANA,  Agriculture  Statistician,  Ministry  of  Agriculture  and  Animal 
Resources,  Kigali,  Rwanda.  E-mail:  kradrimn@gmail.com. 


Crop  Yield  Estimation  with  Farmers'  Appraisal  on  Weather  Condition 


103 


To  overlook  the  impact  of  weather  condition  on  crops  growing  and  yields,  Stasny,  Goel  and  other 
researchers  at  the  Ohio  State  University  developed  a  Bayesian  mixed-effects  county  yield  estimation  algorithm 
with  a  spatial  component  involving  correlations  among  neighboring  counties  (Michael  E.  Bellow,  2007).  In 
addition  to  this,  Bayesian  probability  approach  of  obtaining  yield  forecasting  involves  the  collection  of  expert 
opinion  data  of  farmers  who  are  actually  engaged  in  raising  the  crop  regarding  their  assessments  about  the 
likely  crop  production  (Chandrahas). 

Estimation  Method 

Drawing  on  the  crop  yield  forecasting  and  estimation  literature,  crop  yield  is  a  function  of  weather 
condition,  inputs  and  agro-ecological  conditions  including  weather,  land  and  external  input  use.  To  cover  the 
gap  between  crop  yield  and  risk  resulting  from  climate  change,  this  paper  models  crop  yield  as  a  function  of 
weather  and  crop  growing  conditions  (appearance)  subjected  to  farmers'  information  on  weather. 

Statistical  Model 

From  a  Bayesian  standpoint,  true  model  parameters  were  unknown  and  therefore  considered  to  be  random, 
and  they  were  assigned  a  joint  probability  distribution.  Prior  distribution  was  used  to  summarize  our  state  of 
knowledge,  or  what  is  currently  known  about  the  parameters.  Once  the  data  were  observed,  the  evidence 
provided  by  the  data  was  combined  with  the  prior  distribution  using  Bayes'  Theorem.  The  result  of  combining 
prior  information  and  empirical  evidence  was  an  updated  posterior  distribution  for  the  parameters. 

In  this  context,  four  piece  of  information  was  used  "actual  yield,  yield  targets,  realized  performance 
towards  yields  targets,  and  farmers'  appraisal  on  weather  condition".  Those  pieces  of  information  were  also 
used  to  estimate  variance  and  covariance  information  on  yield.  In  the  empirical  example  for  model  illustration 
six  crops  were  considered  when  trying  to  forecast  yields. 

In  the  first  stage  of  parameter  estimation,  average  yields  ">/'  was  computed  using  actual  yields  of  Maize, 
Rice,  Beans,  Cassava,  Irish  Potatoes,  and  farmers'  appraisal  on  weather  conditions: 

Box  1 

Where  0)tj  =  ytj  *  Vij  and  ptj  =  -^L— 

M=i  yij 

Where  ytj  =  {1,  2,  3,  4  |  yield  values  for  four  percentiles  dividing  actual  yields  into  five  equal  groups}  for 
crop  j  =  Maize,  Rice,  Wheat,  Beans,  Cassava,  and  Irish  Potatoes.  Four  scales  weather  conditions  are  denoted  by 
k  =  Poor,  Fair,  Good,  Very  Good;  and  coy  =  expected  yield  in  each  scale  for  crop  "j".  Weather  conditions  "Qt" 
are  associated  with  actual  yield  yy  by  C^with  { C,vt  =  {l=Poor,  2=Fair,  3=Good,  4=Very  Good  |  i  percentile  is 
linked  with  its  corresponding  k  weather  condition} . 

In  the  second  stage,  actual  performance  to  reach  yield  targets  for  each  crop  "f  "R"  was  computed  using 
average  yields  "yf  and  yield  targets  "y," 

Box  2 

Rj=y-IL 

In  the  third  stage,  average  yields  "y£\  yield  targets  Yy  and  actual  yield  performance  Rj  and  their  associated 
reliabilities  were  used  to  estimate  covariance  and  variance  information  on  yields. 
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To  develop  Bayesian  methods  for  generating  yield  estimates  from  readily  available  crop  and  weather 
conditions  information;  it  was  assumed  that  average  yield  are  normally  distributed  "c/ilx1  ~  Np(D\xy  Ei)"  and 
that  posterior  distribution  of  yields  is  given  by  xl\d\  ~  Nn(ji;  V)"  where  jut+i  =  Estimated  average  yield  and  V  = 
Variance. 

H  =  (A+HA+T)DTlZAdl)and 

V  =  A+HA+T  -  A+HA+TDTXZAD\A+HA+T 

A+=AT(AATyl  "Moore-Penrose  inverse" 

and  E0  =  Ei  +  D\AHA+TDT 

(see  Theorem  1  in  Magnus,  Tongeren  and  Vos  (2000). 

Case  study  Development 

Agriculture  sector  is  predominant  in  Rwanda  and  the  Government  of  Rwanda  has  invested  a  lot  in 
agriculture  sector  by  introducing  new  seed  varieties,  reinforcing  the  use  of  chemical  fertilizers  and  pesticides, 
soil  management  and  land  rehabilitation,  anti-erosion  activities,  farmers  field  school,  etc..  But  controversially, 
Rwanda  experiences  food  prices  fluctuation  over  time.  Hence,  as  growing  season  forecasts  of  crop  yields  are  of 
considerable  interest  to  commodity  market  participants  and  price  analysts,  the  main  problematic  issue  remains 
to  be  the  development  of  consistent  method  with  compatible  predicting  statistical  model  able  to  make  yield 
estimations  with  much  higher  precision  to  explain  those  controversial  phenomena. 

The  yields  data  for  empirical  illustration  used  in  this  paper  came  from  Crop  Assessment  Surveys. 

This  paper  hypotheses  that  Bayesian  estimation  method  that  consider  past  information  on  crop  yields, 
present  farmers'  knowledge  on  weather  conditions  and  present  agro-ecological  information  could  provide  a 
simpler  and  peerless  complementary  statistical  model  to  estimate  crop  yield. 

Results  and  Discussion 

Farmers'  problem  is  an  optimization  problem  in  which  estimated  yield  should  be  close  to  forecasted  yields 
as  possible.  The  optimization  problem  shows  that: 

(1)  The  expected  yields  are  less  than  average  of  yields  recorded  in  22  past  seasons; 

(2)  Forecasted  yields  are  less  than  the  average  of  yields  of  two  latest  previous  seasons; 

(3)  Estimated  yields  are  greater  than  average  of  yields  recoded  in  22  past  seasons. 

(4)  The  final  results  show  that  the  estimated  crop  yields  tend  to  deviate  from  the  forecasted  crop  yields  by 
the  mean  of  yields  recorded  in  22  past  seasons. 

Descriptive  Statistics 

Table  1  shows  descriptive  statistics  "Mean  yield  (Kg/Ha),  Standard  deviation,  Minimum  and  maximum 
yields  (Kg/Ha)  realized  in  1 1  past  years  with  22  observations". 

Figure  1,  shows  the  trends  (2002  to  2012)  of  Maize,  Wheat,  Rice,  Beans,  Irish  Potato  and  Cassava  yields. 
Trends  show  an  overtime  increase  of  yields  for  all  crops  considered. 

Crop  Yield  Estimates 

The  final  results  showed  in  table  2  revealed  that  when  estimated  yields  for  2013  Agriculture  year  are 
compared  to  actual  yields  realized  in  2012  Agriculture  Season;  yield  (Kg/Ha)  for  Maize,  Wheat,  Rice,  Irish 
Potato,  and  Cassava  and  Beans  could  increase  if  the  weather  condition  does  not  change  or  change  slightly  as  it 
was  appraised  by  farmers. 
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Table  2  shows  that  expected  yield  are  less  than  the  mean  of  actual  yields.  The  mean  of  actual  yields  are 
less  than  forecasted  yields.  Forecasted  yield  are  less  than  estimated  yields. 

Table  1 

Descriptive  Statistics 


Variable 


Obs 


Mean 


Std.  Dev. 


Min 


Max 


maize 

wheat 

rice 

beans 

iris  potato 

cassava 


22 
22 
22 
22 
22 
22 


1264 

1096 

4020 

837 

8003 

9093 


717 

598 

1063 

190 

2241 

3135 


559 

377 

1890 

500 

5409 

5177 


2820 

2208 

5751 

1101 

12605 

13974 
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Figure  1.  Trend  of  Crop  Yield  over  last  1 1  Years. 


Table  2 

Crop  Yields  Estimates 


Maize 

Wheat 

Rice 

Beans 

Irish  Potato 

Cassava 

Yj  =  Actual  Yields  (2002  -  2012) 

1,260 

1,125 

4,031 

855 

8,055 

9,326 

Qj  =  Expected  Yields 

1,123 

1,118 

3,378 

701 

6,497 

7,847 

Y2012  =  Actual  Yields 

2,186 

1,916 

5,597 

1,265 

12,115 

14,934 

Y,j  =  Targeted  Yields1 

3,750 

3,500 

7,000 

1,850 

27,500 

35,000 

Y«  =  Forecasted  Yields 

1,459 

1,440 

3,980 

829 

7,809 

9,417 

Yej  =  Estimated  Yields  for  201 3 A 

2,934 

2,889 

5,878 

1,397 

17,793 

22,965 

Assessment  of  the  progress  of  agricultural  land  productivity 

To  assess  the  progress  of  land  productivity  to  feed  population  and  make  available  food  at  markets, 
performances  were  computed  as  the  basis  of  crop  yield  monitoring,  i)  Performance  with  forecasted  yields  that 


1  Yields  collected  in  Rwanda  Agriculture  Agenda  "Agenda  Agricole" 
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combines  past  information,  states  what  was  the  performance  to  reach  targeted  yields  by  the  time  of  setting  those 
targets;  ii)  Actual  performance  was  computed  referring  to  the  average  yields  of  the  latest  two  agriculture 
seasons  to  state  how  far  agriculture  households  were  to  reach  targeted  yields  in  2012;  iii)  Performance  with 
estimated  yields  was  computed  referring  to  the  estimated  yields  and  targeted  yields  to  state  how  far  agriculture 
households  should  be  to  reach  targeted  yields  in  2013. 

Table  3 


Land  productivity  assessment 


Maize 

Wheat 

Rice 

Beans 

Irish  Potato 

Cassava 

Rj  =  Performance  with  Forecasted  Yield 

39% 

41% 

57% 

45% 

28% 

27% 

Actual  Performance 

67% 

75% 

71% 

66% 

64% 

63% 

Performance  with  Estimated  Yield 

78% 

83% 

84% 

76% 

65% 

66% 

Performance  ratios  derived  from  the  model  formulated  using  prior  information  and  empirical  evidences 
could  in  meaningful  way  be  used  in  land  productivity  and  food  market  price  analyses.  Performance  ratios  could 
also  describe  the  relationship  between  agriculture  production  on  one  hand  and  the  effort  made  to  improve  land 
productivity  such  us  introduction  of  new  seeds  varieties;  use  of  fertilizers  and  pesticides;  soil  management  and 
extension  services  on  the  other  hand.  The  used  model  could  play  an  important  role  when  analyzing  the  impact 
of  climate  change  on  agriculture  sector  and  the  effort  made  to  face  the  problem  of  climate  change. 

Policy  Implication 

The  estimation  model  developed  in  this  paper  could  help  decision  takers  and  policymakers:  i)  To  monitor 
land  productivity  and  yield  targets;  ii)  To  link  crop  yield  and  crop  production;  iii)  To  monitor  food  availability, 
food  demand  and  market  access  when  assessing  food  shortage  and  planning  for  food  redistribution";  iv)  To  link 
weather  condition  a  constraints  to  land  productivity  and  food  availability  with  meaningful  early  warnings  to 
lower  crop  production  risks  and  ensure  public  awareness  and  preparedness  to  act;  v)  To  correlate  food  prices 
with  agriculture  production  and  climate  change. 

Concluding  Remarks 

This  paper  develops  predicting  statistical  model  laying  on  Bayesian  Method.  The  developed  Statistical 
model  for  crop  yield  estimation  could  contribute  to  the  development  and  credibility  of  agricultural  statistics. 
This  paper  comes  with  new  insights  and  paves  the  way  for  agricultural  policy  analysis  by  providing  timely  high 
precision,  credible  and  compatible  yield  estimates  that  could  lead  to  reliable  crop  production  estimation.  As 
crop  yields  have  significant  impact  for  both  commodity  prices  and  farmer  income,  growing  season  estimates  of 
crop  yields  provided  by  this  model  could  be  of  considerable  utilization  to  price  and  food  security  analysts. 
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Annexes 

Methodology 

To  perform  Bayesian  Estimation  Method  for  Crop  yield  forecasting,  five  matrixes  d6xl  Matrix  of  Targeted  yields,  [£>6xi2  = 
(76x606x6)]  Matrix  for  Crops  to  be  used  in  yield  forecasting,  A12m  Matrix  of  Actual  level  to  reach  targeted  Yield  "Yt",  Variance 
Matrix  (E{)  ,  Covariance  Matrix  (FT)  were  estimated  and  used  in  the  Bayesian  Estimation  Model.  The  matrices  dimensions  follow 
the  number  of  crop  which  we  want  to  estimate  their  yield.  In  the  empirical  example  for  model  illustration  six  crops  were 
considered  which  yield  estimates  are  needed  for  each  and  every  crop  (6+6=12).  The  full  steps  to  estimates  those  matrices  are 
illustrated  bellow: 

In  the  first  stage  of  estimating  parameters,  Forecasted  Average  yield  'V«"  using  actual  yields  for  Maize,  Rice,  Beans,  Cassava, 
Irish  Potatoes,  and  Farmers  Appraisal  on  weather  condition  was  computed  as  follow: 

yfj  XUvijdk 

Where  0)tj  =  yi}  *  ptj  with  ptj  =  -^L— 

Where  yt  =  {1,  2,  3,  4  |  yield  values  for  four  percentiles  dividing  actual  yields  into  five  equal  groups}  for  crop  "/'  with  j  = 
{Crops  j  Maize,  Rice,  Wheat,  Beans,  Cassava,  and  Irish  Potatoes},  k  =  {Poor,  Fair,  Good,  Very  Good  |  Q=  Weather  Condition}; 
and  a>y  =  expected  yield  in  each  category  for  crop  "f.  Weather  Conditions  "CiJt"  are  associated  with  Actual  yield  y,/  by  C^with 
{Cik=  {l=Poor,  2=Fair,  3=Good,  4=Very  Good  [  i  percentile  is  linked  with  its  corresponding  k  weather  condition}. 

In  the  second  stage,  actual  performance  to  reach  yield  targets  for  each  crop  "f  "R"  is  computed  basing  on yfj  =  Forecasted 
Yield  for  Crop  (j)  and  Ytj  =  Yield  Target  for  (/'): 

Rj=yJL 

In  the  third  stage,  Yg  and  their  associated  reliabilities  (in  this  paper  reliability  was  assigned  to  be  High  Medium  "HM  =  3%" 


108 


Crop  Yield  Estimation  with  Farmers'  Appraisal  on  Weather  Condition 


and  Medium  "M  =  6%")  were  used  to  estimate  Variance  Matrix  (Ei);  while  Ytj,  R/,  and  their  associated  reliabilities  were  used  to 
estimate  Covariance  Matrix  (H).  Matrix  D  was  estimated  using  Identity  Matrix  (6x6)  of  6  Six  crops  used  in  this  paper  and  Zero 
Matrix  (6x6)  of  forecasted  Yield  for  six  crop  therefore  D6xl2  =  (I6x6  06x6),  the  column  matrix  (d6xl)  of  Laboratory  yield. 

Assuming  that  "d^x1  ~  Np(D,xl  Et)  where  D1(p:n)  has  full  row-rank  and  Sj  is  positive  definite  (hence  non-singular);  (ii)  Ax'  ~ 
Nm(h;H)  where  A  =  (Al  :  A2),  a  column  vector  h  =  (h,;  h2);  a  block  diagonal  matrix  H  =  (H1;H2)  with  H,  associated  with  ^4;  and 
H2  with  A2;  (iii)  A  has  full  row-  rank  and  H  may  be  singular.  Then  the  posterior  distribution  of  x'  is  given  by  y}\d,  ~  Nn(n;  V  )" 
with: 

!i=(A+HA+T)DT{Zldl) 

and 

V  =  A+HA+T-A+HA+TDT{LAD\A+HA+T 
Where  A+  =AT  (AAT)~l  "the  Moore-Penrose  inverse",  and 

X0  =  E,  +D\A+HA+TDT 


Description  of  data 


Crop  Yields  2002 

-  2012  (22  Agricultural  Seasons) 

Yield  (Kg/Ha) 

Maize 

Wheat 

Rice 

Beans 

Irish  Potato 

Cassava 

2002A 

927 

800 

2,714 

755 

6,682 

7,700 

2002B 

664 

755 

3,335 

595 

8,067 

7,098 

2003A 

804 

725 

4,000 

773 

6,818 

7,318 

2003B 

577 

538 

1,890 

550 

5,682 

6,255 

2004A 

845 

725 

3,188 

709 

6,864 

7,045 

2004B 

559 

760 

3,300 

500 

5,409 

5,773 

2005A 

860 

785 

3,188 

709 

7,259 

7,032 

2005B 

736 

940 

4,583 

514 

7,000 

5,727 

2006A 

824 

628 

3,188 

680 

7,168 

6,905 

2006B 

709 

503 

4,525 

1,007 

6,535 

5,478 

2007A 

723 

533 

4,525 

988 

6,500 

5,671 

2007B 

706 

377 

2,817 

833 

5,800 

5,177 

2008A 

1,114 

1,430 

3,791 

988 

6,050 

12,425 

2008B 

718 

419 

2,917 

831 

6,667 

11,300 

2009A 

1,797 

2,208 

4,479 

1,019 

6,533 

13,974 

2009B 

1,551 

1,371 

4,159 

843 

10,537 

12,345 

201  OA 

2,820 

2,127 

4,786 

1,076 

10,563 

10,616 

201  OB 

1,853 

1,329 

5,137 

970 

9,563 

12,609 

201 1A 

2,270 

1,924 

5,211 

937 

12,102 

10,917 

2011B 

2,283 

2,039 

5,751 

1,011 

11,186 

13,933 

2012A 

2,406 

1,562 

5,725 

1,101 

12,605 

12,072 

2012B 

1,965 

2,270 

5,469 

1,428 

11,625 

17,795 

Descriptive  Statistics 

Variable 

Obs 

Mean 

Std.  Dev. 

Min 

Max 

maize 

22 

1264 

717 

559 

2820 

wheat 

22 

1096 

598 

377 

2208 

rice 

22 

4020 

1063 

1890 

5751 

beans 

22 

837 

190 

500 

1101 

iris  potato 

22 

8003 

2241 

5409 

12605 

cassava 

22 

9093 

3135 

5177 

13974 
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Cut-of 

Pii 

n 

r 

V  P  c 

n  c 

20 

7H8 

U.  1 J 

1  r\i 

r\  a  7fi 
U.4  / 0 

3o, U  /  / 

C  1 

J  1 

Maize  Percentiles 

40 
60 

oUo 

1077 

n  1 7 
y).  i  / 

0.23 

1  1Q 

248 

f\  1A1 

0.148 

Z  /,olo 

39,489 

1A 

37 

80 

7H87 
ZU5  / 

U.4J 

Q1 1 
yj  1 

ft  1  78 
U.1Z5 

7/1  8  fC\A 
Z4o,0U4 

11Q 

i  iy 

20 

^1A 
J  JO 

n  1 7 

U.  1Z 

04 

ft  47£ 
U.4  /  O 

1  £  7ft  1 
1  0,ZU1 

ift 

jU 

Wheat  Percentiles 

40 
60 

/JO 

1252 

n  1 7 

0.28 

1  77 
1Z  / 

347 

ft  7/1  7 

0.148 

71  £1Q 

64,283 

1 1 
j  1 

51 

80 

1  07H 

iy  /u 

n  aa 

U.44 

e^ft 

oOU 

U.  IZo 

7  1  £  8/1/1 
Z  1  0,o44 

1  1  ft 

1 1U 

20 

1H7Q 
jU  ly 

n  1  q 
u.  iy 

JOD 

ft  47^ 

U.4  /  o 

8^8  470 

o  Jo,  4  ly 

77Q 

Z  ly 

Rice  Percentiles 

40 
60 

1/1  7£ 
34Z0 

4516 

r\  1 1 

U.Zl 

0.28 

77^ 

/Zj 
1260 

ft  7/1  7 

0.148 

0 1  J,  J 1 0 

841,942 

1  7Q 

1  ly 
186 

80 

j  10  / 

U.3Z 

1  &AQ 

U.IZo 

1  ftaft 

7  1  1 
Z  1  1 

20 

040 

ft  1  o 
u.  iy 

171 
1ZJ 

ft  A  If 
U.4  1  0 

17  £Q£ 

j  /,oyo 

^8 

Jo 

Beans  Percentiles 

40 
60 

784 

963 

n  7i 
0.28 

1  o  1 

272 

ft  747 
U.Z4  / 

0.148 

14  07ft 

J4,y  /u 
38,816 

4^ 
4j 

40 

80 

1  m  a 

ft  in 

U.  JU 

1ft7 

n  1 78 

U.  IZo 

10  1  04 
jy,  1  yH 

10 

jy 

20 

U  JZ.U 

ft  ?ft 

1  Z.O  J 

ft  47fS 

J,OU  J,OU  1 

Irish  Potato  Percentiles 

40 
f.n 

uu 

6709 
7241 

0.22 
0.23 

1448 
1687 

0.247 
0.148 

2,399,805 
1,807,711 

358 
250 

80 

10812 

0.35 

3761 

0.128 

5,205,257 

481 

20 

5755 

0.16 

916 

0.476 

2,508,073 

436 

Cassava  Percentiles 

40 
60 

7056 
10857 

0.20 
0.30 

1377 
3259 

0.247 
0.148 

2,399,246 
5,236,726 

340 
482 

80 

12499 

0.35 

4319 

0.128 

6,910,351 

553 

Model  Estimation 

Estimation  of  Matrix  D 

Maize     Rice  Wheat 

Beans 

Cassava 

Irish 
Potatoes 

MaizeF 

RiceF 

WheatF 

BeansF      CassavaF  Jf*^1 

PotatoesF 

1            0  0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0            1  0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0            0  1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0            0  0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0            0  0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0            0  0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

Estimation  of  Variance  Matrix  (E  Matrix) 

Crops 

Mean 

Reliability 

PrioSev= 
Mean  *  Re 

PrioV  = 
PrioSe2 

Maize 

1260 

0.01 

13 

159 

Wheat 

1125 

0.05 

56 

3164 

Rice 

4031 

0.05 

202 

40622 

Beans 

855 

0.05 

43 

1828 

Irish  Potato 

8055 

0.05 

403 

162208 

Cassava 

9326 

0.05 

466 

217436 

Estimated  Variance  Matrix 

Maize  Wheat 

Rice 

Beans 

Irish  Potato 

Cassava 

159  0 

0 

0 

0 

0 

0  3164 

0 

0 

0 

0 

0  0 

40622 

0 

0 

0 

0  0 

0 

1828 

0 

0 

0  0 

0 

0 

162208 

0 

0  0 

0 

0 

0 

217436 
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Estimation  of  covariance  Matrix  (H  Matrix) 


Yr] 

Ytj 

Pi  Reability 

PrioSe 

PrioV 

BijA2 

PrioV*BijA2 

Maize 

1,459 

3750 

0.39  0.01 

0.004 

0.0000 

14062500 

213 

Wheat 

1,440 

3500 

0.41  0.05 

0.021 

0.0004 

12250000 

5181 

Rice 

3,980 

7000 

0.57  0.05 

0.028 

0.0008 

49000000 

39600 

Beans 

829 

1850 

0.45  0.05 

0.022 

0.0005 

3422500 

1716 

Irish  Potato 

7,809 

27500 

0.28  0.05 

0.014 

0.0002 

756250000 

152446 

Cassava 

9,417 

35000 

0.27  0.05 

0.013 

0.0002 

1225000000 

221679 

Estimated  Covariance  Matrix 

Maize 

Wheat 

Rice 

Beans 

Irish  _ 

_  .  Cassava 

Potato 

MaizeF 

WheatF 

RiceF 

BeansF 

Irish 
PotatoF 

CassavaF 

213 

0 

0 

0 

0  0 

0 

0 

0 

0 

0 

0 

0 

5181 

0 

0 

0  0 

0 

0 

0 

0 

0 

0 

0 

0 

39600 

0 

0  0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1716 

0  0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

152446  0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0  221679 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0  0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0  0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0  0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0  0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0  0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0  0 

0 

0 

0 

0 

0 

0 

Estmation  of  A  Matrix  (Pi) 

Maize 

Rice 

Wheat 

Beans 

_  Irish 
Cassava    _  .  , 

Potatoes 

MaizeF 

RiceF 

WheatF 

BeansF 

CassavaF 

Irish 

PotatoesF 

-0.39 

0 

0 

0 

0  0 

1 

0 

0 

0 

0 

0 

0 

-0.41 

0 

0 

0  0 

0 

1 

0 

0 

0 

0 

0 

0 

-0.57 

0 

0  0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

-0.45 

0  0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

-0.28  0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0  -0.27 

0 

0 

0 

0 

0 

1 

-1 

0 

0 

0 

0  0 

1 

0 

0 

0 

0 

0 

0 

-1 

0 

0 

0  0 

0 

1 

0 

0 

0 

0 

0 

0 

-1 

0 

0  0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

-1 

0  0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

-1  0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0  -1 

0 

0 

0 

0 

0 

1 

Estimation  of  dT  Matrix 

Maize 

Rice  Wheat 

Beans 

Cassava 

Irish  Potatoes 

1,459 

1,440  3,980 

829 

7,809 

9,417 
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Academic  dishonesty  is  a  disturbing  issue  in  higher  education  that  has  been  worsening  over  the  years,  especially 
with  the  appearance  of  the  internet  and  the  e-learning  education.  This  new  technology  exposes  students  to  the 
opportunity  of  using  online  bank  exams  and  term  papers  and  increases  their  tendency  to  cheat.  This  study 
investigates  student  academic  dishonesty  in  the  context  of  traditional  and  distance-learning  courses  in  higher 
education.  Data  from  1,365  students  enrolled  in  academic  institutes  in  the  U.SA  and  Israel  were  surveyed  to  assess 
their  personality  and  their  willingness  to  commit  various  acts  of  academic  misconduct.  The  findings  indicate  that  in 
both  countries  dishonest  behaviors  are  greater  in  face-to-face  courses  than  in  online  courses.  In  addition,  both 
American  and  Israeli  students  identified  with  the  personality  trait  of  Agreeableness  showed  a  negative  correlation 
with  academic  dishonesty.  Furthermore,  Israeli  students  identified  with  the  personality  traits  of  Conscientiousness 
and  Emotional  Stability  demonstrated  a  negative  correlation  with  academic  dishonesty.  In  contrast,  the  personality 
trait  of  Extraversion  among  American  students  was  found  in  a  positive  correlation  with  academic  misconduct. 
Implications  for  further  research  are  discussed. 

Keywords:  Academic  Dishonesty,  Personality  Traits,  OCEAN,  Online  Courses. 
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Introduction 

Academic  dishonesty  has  been  described  as  an  act  of  cheating,  deception  and  violation  of  rules  for  a 
personal  gain  or  advantage  [1],  [2]  done  by  the  student,  "a  conscious  effort  to  use  proscribed  data  and/or 
resources  on  exams  or  written  work  submitted  for  academic  credit"  [3].  Before  engaging  in  unethical  academic 
behavior,  a  student  has  to  make  a  rational  decision  that  the  benefit  of  cheating  worth  the  risk  of  getting  caught 
[4].  There  are  two  types  of  academic  dishonesty  -  active  and  passive,  both  including  an  intention  for  cheating. 
Active  includes  an  act  for  raising  a  student's  grade,  whereas  passive  includes  a  behavior  of  assisting  other 
student  to  raise  his  grade  [5]. 

Researchers  have  shown  that  academic  dishonesty  has  been  worsened  over  the  years  [6],  [7],  [4],  [8],  and 
that  cheating  is  an  epidemic  phenomena  across  most  college  campuses  [9],  [10],  [11].  The  development  of 
information  technology  and  the  accessibility  of  academic  material  on  the  internet  made  it  easier  to  engage  in 
cheating  and  in  plagiarism  [12],  [13],  [14],  and  [15].  Furthermore,  assignments  and  papers  are  available  for 
purchase  to  students  who  seek  for  it  [16]. 

In  addition,  the  growth  of  technology  encouraged  the  existence  of  online  courses  and  distant  education. 
According  to  the  National  Center  for  Education  Statistics  [17],  almost  4.3  million  undergraduate  students  are 
participating  in  online  courses  per  year.  There  is  a  notion  that  it  is  easier  to  cheat  when  participating  in  distance 
learning  classes  [18],  and  both  students  and  faculty  are  aware  of  the  intensity  of  this  phenomenon  compared  to 
traditional  courses  [11],  particularly  where  there  is  little  or  no  personal  contact  between  students  and  faculty 
[19],  [20].  Similarly,  Kelley  and  Bonner  [21]  suggested  that  students  who  feel  close  to  their  professors  tend  to 
be  more  honest.  However,  in  the  online  learning  environment  the  ability  for  faculty  to  develop  a  strong  rapport 
with  students  becomes  more  difficult.  Students  who  feel  "distant"  from  others  seem  to  have  higher  tendency  to 
perform  deceptive  behaviors,  such  as  cheating  [22],  [23].  Online  courses,  in  contrast  to  traditional  classroom 
courses,  may  serve  to  exacerbate  these  feelings  of  separation  and,  thus,  may  contribute  to  higher  incidence  of 
academic  dishonesty  [24],  [25]. 

The  research  literature  shows  that  academic  dishonesty  is  influenced  not  only  by  situational  factors 
(circumstantial  and  contextual)  [26],  such  as  the  teaching  method  -  on-line  vs.  traditional  classrooms,  as  we 
mentioned  above,  but  individual  factors  as  well  (demographic,  psychosocial  and  academic  characteristics  of 
students)  [27].  One  of  these  individual  factors  is  related  to  various  personality  traits. 

Students'  Personality  as  a  Predictor  of  Academic  Dishonesty 

Research  regarding  the  relationship  between  unethical  academic  behavior  and  personality  traits  includes 
several  studies,  while  each  study  uses  a  different  measure  of  academic  dishonesty.  Hence,  the  results  are  often 
contradicting  [28],  [29]  and  [30].  Although  the  ability  of  the  'Big  Five'  personality  traits  measure  was  proved 
effective  in  explaining  unethical  behaviors  [31],  and  would  be  expected  to  have  a  direct  impact  on  the  level  of 
students'  cheating  behavior  [32],  it  is  not  frequently  used  in  the  context  of  academic  dishonesty  and  most 
researches  who  did  use  it  addressed  only  few  traits  instead  of  the  whole  model  [30],  [31].  Below  are  explained 
the  personality  traits  of  the  "Big  Five"  model  in  the  context  of  academic  dishonesty. 

Big  Five  personality  traits  include  Extraversion,  Conscientiousness,  Agreeableness,  Neuroticism  and 
Openness  to  experience  [33].  The  personality  trait  of  Extraversion  is  characterized  as  the  tendency  to  be 
talkative,  assertive,  energetic,  sensation-seeking  [36]  and  looking  for  excitement  [35],  [36].  Those  individuals 
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seek  power,  status  and  recognition  [37],  therefore  socializing  with  peers  [38],  [37],  [39],  [40]  and  building 
relationships  for  future  necessity  [41].  Introverts  (which  is  the  reverse  of  extra  verts)  prefer  to  be  alone,  and  thus, 
they  are  less  likely  to  be  influenced  by  others'  cheating  behavior  [42].  In  contrast,  extraverts  are  more 
impulsive  and  less  self-controlled  [43].  This  tendency  causes  them  to  be  more  vulnerable  to  an  unethical 
behavior  as  they  are  prone  to  imitate  others  [44].  This  being  said,  it's  important  to  note  that  the  studies  that 
addressed  this  trait's  effect  on  academic  dishonesty  are  scarce  and  their  results  are  contradicting  [34].  Some  of 
them  have  found  that  extraversion  was  positively  related  to  cheating  behavior  and  academic  dishonesty  [45], 
[46],  [47],  while  others  did  not  find  this  impact  on  the  level  of  cheating  tendency  among  students  [32]. 

Conscientiousness  describes  organized  and  responsible  individuals,  who  think  and  plan  before  taking  any 
action  and  follow  society's  rules  and  norms.  Accordingly,  the  Conscientious  student  may  be  described  as 
dependable,  achievement-oriented,  persistent,  responsible  and  honest  [39].  He  operates  as  an  effective  regulator 
of  his  own  actions,  who  is  able  to  restrain  and  regulate  behavior  through  "effortful  control",  thus,  he  can  resist 
cheating  [48]  and  hold  more  negative  attitudes  towards  academic  dishonesty  [49].  He  acts  with  high 
productivity  and  less  deviance  [51].  As  opposed,  student  with  lower  conscientiousness  is  expected  to  be 
irresponsible,  disorganized  and  impulsive.  As  a  consequence,  these  characteristics  might  lead  to  poorer 
studying  skills,  which  in  turn  might  increase  the  tendency  to  cheat.  Accordingly,  research  has  found  that  this 
trait  can  foresee  unethical  behaviors  [51],  [52],  [53]. 

Agreeableness  involves  cooperating  with  others  and  maintaining  harmony.  Individuals  that  are  high  on 
agreeableness  have  high  ability  to  create  good  relationship  [54]  and  peruse  group  norms  [55].  In  contrast,  those 
that  are  low  in  this  trait  are  expected  to  be  lacking  in  these  behaviors.  Although  research  has  not  found  a 
significant  impact  of  agreeableness  on  academic  dishonesty  in  general  [32],  the  study  of  Williams  et  al.  [34] 
found  that  agreeableness  was  negatively  correlated  with  a  particular  unethical  academic  behavior  of  plagiarism. 
Thus,  it  can  be  expected  that  individuals  who  are  high  in  agreeableness  will  follow  the  rules  and  norms  and  will 
less  implement  deviant  behavior. 

Neuroticism  is  a  personality  trait  that  describes  an  individual  with  a  non-constructive  emotionality  [39], 
[33].  Thus,  not  surprisingly,  in  research  literature  it  has  been  associated  with  organizational  deviance  [51]  and 
has  a  negative  impact  on  the  tendency  to  engage  in  unethical  academic  behaviors  and  in  cheating  [34]. 
Emotional  Stability  (which  is  the  reverse  of  neuroticism)  reflects  students'  enhanced  feeling  of  competence  and 
a  sense  of  security  [39],  which  allows  them  to  be  more  relaxed,  unworried  and  less  likely  to  become  strained  in 
stressful  conditions,  such  as  tests  or  deadlines.  Thus,  these  students  are  considered  to  be  less  inclined  toward 
cheating  behaviors  [49]. 

Finally,  high  Openness  to  Experience  includes  tendencies  toward  intellectualism,  creativity,  imagination, 
and  broad-mindedness  [39],  [56],  cognitive  capability  and  high  training  aptitude  [39].  Research  findings  show 
that  this  personality  trait  is  related  to  academic  success  and  to  learning  orientation,  reflecting  desire  to 
understand  concepts  and  master  material  [49].  Furthermore,  learning  orientation  predicted  lower  inclination  to 
cheat  [50]. 

Empirical  research  confirmed  the  relationship  between  personality  traits  and  individual  tendency  to  cheat 
for  Extraversion  and  for  Neuroticism  [30].  In  addition,  low  Conscientiousness  and  low  Agreeableness  was 
found  as  predicting  cheating  behaviors  as  well  [38].  More  recently,  Day  et  al.  [49]  examined  the  effects  of 
Conscientiousness,  Emotional  Stability,  and  Openness  to  Experience  on  students'  attitudes  towards  cheating, 
combined  with  two  context  variables,  e.g.,  classroom  culture  and  pedagogy.  The  findings  showed  that  while 
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Conscientiousness  was  the  sole  personality  measure  that  directly  predicted  negative  attitudes  towards  cheating, 
Emotional  Stability  and  Openness  to  Experience  also  lead  to  negative  attitudes  towards  academic  misconduct, 
however,  only  when  combined  with  classroom  context  variables.  Based  on  the  above,  we  hypothesize  that  there 
will  be  differences  in  the  level  of  academic  dishonesty  based  on  the  various  personality  traits  especially  among 
e-learners. 

Method 

Participants 

The  sample  consisted  of  1,574  participants  with  803  from  two  American  academic  institutes  and  771  from 
four  Israeli  academic  institutes.  65%  of  the  participants  were  women  and  35%  were  men.  The  age  ranged  from 
17  to  59  (the  mean  was  26.4  years).  26%  of  the  participants  were  freshmen,  32%  -  sophomores,  20%  -  juniors, 
19%  -  seniors,  and  3%  were  graduate  students.  46%  were  Christians,  38%  were  Jews,  and  16%  were  Muslims. 
13%  of  the  participants  were  excluded  from  the  analysis  because  their  surveys  were  incomplete  or  carelessly 
completed.  Therefore,  the  final  data  set  consisted  of  1,365  participants. 

Survey  Instrument 

A  three  part  survey  instrument  was  used  in  the  current  study.  Part  1  included  the  TIPI  scale  developed  by 
[59],  which  was  consisted  of  10  items  assessing  the  participants'  personality  traits.  The  reliability  of  this 
questionnaire,  measured  by  Cronbach's  alpha,  was  0.72.  Part  2  was  consisted  of  the  questions  that  examined 
academic  integrity  using  the  Academic  Integrity  Inventory  [57].  These  questions  investigated  the  students' 
likelihood  to  engage  in  various  forms  of  academic  misconduct.  The  instrument  was  validated  by  [57]  and 
reliability  of  this  questionnaire,  measured  by  Cronbach's  alpha,  was  0.75.  Part  3  presented  a  series  of 
socio-demographic  questions. 

Procedure 

In  order  to  encourage  the  participants  to  think  in  the  frame  of  a  specific  type  of  course,  we  administered  a 
printed  version  of  the  survey  instrument  in  the  traditional  face-to-face  courses  and  an  on-line  version  of  the 
survey  instrument  in  the  e-learning  courses.  The  survey  instruments  were  coded  and  grouped  according  to  the 
location  of  the  participants'  college  or  university  (USA  or  Israel).  The  questionnaires  were  distributed  at  the 
end  of  the  courses. 

Results 

Table  1  summarizes  the  results  of  Independent  Sample  T-test  analyses,  which  indicate  that  there  were 
statistically  significant  differences  in  students'  likelihood  to  engage  in  academic  dishonesty  based  on  the  type 
of  course  in  which  they  were  enrolled.  Specifically,  it  was  found  that  students  in  face-to-face  courses  were 
more  likely  to  engage  in  acts  of  academic  dishonesty  than  their  counterparts  in  e-learning  courses. 
Table  1 


Differences  in  academic  dishonesty  by  course  type  and  country 


Country 

Course  type 

N 

Mean 

S.D. 

T-Test 

F 

USA 

E-learning 
Face-to-Face 

287 
470 

1.61 
2.16 

0.52 
0.66 

12.70*** 

57.16*** 

Israel 

E-learning 
Face-to-Face 

293 
315 

1.78 
2.52 

0.60 
0.65 

5.33*** 

'P<0.001,**P<0.01,*P<0.05 
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Based  on  multiple  analysis  of  variance  (MANOVA)  analysis  we  found  significant  interaction  between 
country  and  course  type  [F(1>  136i)  =57.16,  p<0.001]. 

Table  2 


Correlation  between  personality  and  academic  dishonesty  by  course  type  and  country 


Country 

Course  type 

1 

2 

3 

4 

5 

Israel 

On-line 

-0.038 

-0.149* 

-0.125* 

-0.246** 

-0.068 

N=  608 

Face  to  face 

-0.090 

0.131-* 

-0.237** 

-0.151** 

-0.063 

USA 

On-line 

-0.100 

-0.090 

0.057- 

0.121-* 

-0.038 

N=  757 

Face  to  face 

-0.016 

-0.040 

0.031- 

-0.114* 

0.105* 

***P<0.001,  **P<0.01,  *P<0.05 


Note:  l=Openness  to  Experiences,  2=Emotional  Stability,  3=Consciousness,  4=Agreeableness, 
5=Extraversion 

Table  2  shows  a  significant  negative  correlation  between  the  personality  trait  of  Agreeableness  and 
academic  dishonesty  in  both  countries:  Israel  and  USA.  In  addition,  Israeli  students  identified  with  higher 
Conscientiousness  and  Emotional  Stability  demonstrated  a  significant  negative  correlation  with  academic 
dishonesty.  General  Linear  Model  revealed  that  there  is  a  significant  2-way  interaction  effect  among  Israeli 
students  between  course  type  (on-line  vs.  face-to-face)  and  the  personality  trait  of  Conscientiousness  [F=2.058, 
p<0.05]  and  between  course  type  and  the  personality  trait  of  Emotional  Stability  [F=2.047,  p<0.05]. 
Interestingly,  the  personality  trait  of  Extraversion  among  American  students  was  found  in  a  positive  correlation 
with  academic  dishonesty,  indicating  that  the  tendency  to  be  sociable  is  correlated  with  higher  inclination  to 
cheat. 

Discussion  and  Conclusion 

Our  research  found  that  there  is  less  overall  cheating  in  the  virtual  than  in  the  traditional  classroom 
settings.  These  findings  are  similar  to  [58]  and  [59],  who  explained  this  phenomenon  by  the  notion  that  these 
students  may  have  a  higher  motivation  to  learn  or  able  to  learn  without  being  dependant  on  the  typical  structure 
of  traditional  classroom  settings. 

Our  research  also  indicates  that  the  personality  traits  of  Emotional  Stability  and  Conscientiousness  are 
negatively  related  to  academic  dishonesty.  These  findings  are  similar  to  [49]  and  can  be  explained  by  to  the 
notion  that  students  with  high  Conscientiousness  have  the  proper  tendencies  to  be  able  to  effectively  regulate 
his  actions  and  restrain  inappropriate  behaviors  [48],  [51],  [52],  [53].  Emotional  Stability  also  leads  students  to 
be  less  inclined  toward  cheating  behaviors  [49]  by  enhancing  feelings  of  competence  and  providing  them  with 
sense  of  security,  which  in  result  allow  them  to  successfully  cope  with  stressful  situations  and  conditions  [39]. 
In  addition,  a  significant  negative  correlation  between  the  personality  trait  of  Agreeableness  and  academic 
dishonesty  indicates  that  the  more  are  the  students  cooperative  with  others,  since  the  trait  of  Agreeableness  is 
associated  with  the  ability  to  create  good  relationship  and  to  conform  with  group  norms  [54],  [55],  the  less  are 
they  likely  to  be  academically  dishonest. 

The  results  of  this  research  showed  that  these  effects  were  not  observed  among  American  students.  This 
might  be  explained  by  cultural  differences,  as  several  studies  that  compared  US  students  with  students  in 
Lebanon  [60],  China  [61]  and  non- Western  countries  [62],  showed  that  Americans  tend  to  show  less  acceptance 
for  cheating  and  to  possess  higher  standards  with  regard  to  honesty. 
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The  main  practical  implication  and  contribution  of  this  research  are  to  the  process  of  students'  profiling, 
since  we  found  that  students  who  use  cheating  practices  are  less  emotionally  stable,  less  conscious  and  less 
agreeable.  Further  research  should  focus  on  how  to  amplify  cooperative  tasks  in  online  courses  in  order  to 
reduce  Academic  Dishonesty.  Classroom  contextual  effects,  such  as  those  presented  in  [49] 's  study,  may  be 
worth  investigating  in  further  research  as  well,  since  they  seem  to  contribute  to  the  knowledge  regarding  the 
effects  of  personality  traits  on  attitudes  toward  cheating  [49]. 
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